release speech translation corpus
Microsoft Translator publicly releases speech translation corpus
As part of an ongoing effort within Microsoft to improve the accuracy of artificial intelligence (AI) systems, Microsoft Translator is publicly releasing a set of data that includes multiple conversations between bilingual speakers who are speaking French, German and English. This corpus, which was produced by Microsoft using bilingual speakers, aims to create a standard by which people can measure how well their conversational speech translation systems work. It can serve as a standardized data set for testing bilingual conversational speech translation systems such as the Microsoft Translator live feature and Skype Translator. Christian Federmann, a senior program manager working with the Microsoft Translator team, said there aren't as many standardized data sets for testing bilingual conversational speech translation systems. "You need high-quality data in order to have high-quality testing," Federmann said.